Fina Short UEP239 Final Project

Suitability Analysis: Where Boston-based college grads (like me) should live

In this project I will execute a suitability analysis determining the most suitable ZIP Code Tabulation Area (ZCTA) for recent Boston-area college grads like me. The project focuses on the Boston Metropolitan Area, as defined by the boundary of the Boston Region Metropolitan Planning Organization (MPO). The analysis uses the following indicators to calculate and map suitability for this demographic:

Points of Interest: transit access

Feeding ourselves: fresh food access

Demographics of Interest: Age, Housing Price and Pop. Density

Analysis overview

Below is a rough outline of the analysis I will perform:

  1. Pinpoint area of focus > Read in > visualize > and join Massachusetts base map data to pinpoint our field of analysis
  2. Measure train access > join MBTA nodes file to Boston Zip Codes data > normalize to index on 0-1 scale where 1= high suitability with most T stations per zip code
  3. Measure bus access > join MBTA bus stops file to Boston Zip Codes data > normalize to index on 0-1 scale where 1= high suitability with most bus stops per zip code
  4. Assess median housing price > join census data to Boston Zip Codes data to calculate housing by zip > normalize to index on 0-1 scale where 1= high suitability with lowest rent
  5. Calculate population age distribution > use ACS population data to determine percent aged in twenties by region > normalize 0-1 where higher percent=most suitable
  6. Calculate weighted and unweighted suitability map > using an additive scale derived from all indexed indicators

How will we quantify a suitable "zip code tabulation area?" Using the following data:

Import Dependencies

Massachusetts & Boston Area Datasets

Now that we have the dependency imports we need, we can read in the data delineating our regions of analysis. First, we will use a state outline containing a detailed coastline to clip our zip codes file, thus providing a more accurate outline for zip code data that will match better with the Boston Region boundary.

The ZCTA shapefile is projected in the 2D CRS of EPSG:4269. We will reproject to EPSG:6491, the newest standard recommended for mainland Massachusetts.

Next is a shapefile with polygon data for the Boston Metropolitan Area as defined by the boundary of the Boston Region Metropolitan Planning Organization (MPO).

Data for Availability & Accessibility | Transit Access

To map transit access, we'll index density of MBTA bus and subway stops per zip code. Let's start with the buses.

Of course, some zip codes are much larger than others, so prevalence of bus stops might not necessarily mean you're more likely to have good access. Let's use our new stop counts to calculate bus stop density as a better indicator of transit access.

Visualizing bus stop density by zip code:

It looks like buses are much more densely packed within many of the smaller zip codes concentrated at the center of Boston. Out of curiosity, let's look at the number of buses by zip code without accounting for area.

Hmm, there are indeed some areas that looked like they had better access than they truly do- good thing we accounted for area. Now let's look at which zip codes have the best access based on our new density indicator. In the table below, the top 5 rows represent the areas with the best bus access, while the bottom 5 show those with the worst according to our density calculations.

Finally, it's time to create our bus stop suitability index by normalizing these indicators on a scale ranging from 0-1. Here, 1 represents an area most densely concentrated with bus stations, and therefore characterized by our analysis as most suitable.

Now that we've found spots with the most buses, we'll also evaluate density of MBTA stations per zip code, of which there are far fewer than bus stops. Yes, this analysis will include the Silver Line despite it technically being a bus.

Clearly, T stops are fewer and further between than bus stops. Now we'll join T stop data with zip code data to see which zip codes have the densest concentration of these key transit access points.

In the below table we see the 5 zip codes most saturated with T stations, and 5 of the zip codes that do not contain any.

In order to develop a suitability indicator based on this data, we next normalize the indicator values into a suitability index ranging from 0 to 1. Locations with the most train stations should be at the highest end of this scale. Below is the new data frame with a column added for normalized values.

Points of Interest | Fresh Food Access (the farmer's market!)

Counting markets per zip code:

And now looking at density- looks like it's important to be closer to downtown if you want to actually be close to a market rather than just having one in your zip. We'll use that as the indicator instead.

Finally, we're normalizing our density values for the weighted map we'll eventually generate. Here, we know that more dense=better= a value of 1.

First we will read in tabular American Community Survey census data to look at our variables "S0101_C02_006E" and "S0101_C02_007E" which represent the estimated percent of total zip code population aged 20-24 and 25-29, respectively. (We will fix those variable names in a second.) Personally, I'd prefer to live around other people in their twenties, but beyond that I'm unconcerned what age my neighbors are, meaning we will not include any other age ranges beyond the twenties as weights in our index.

We need to do a little bit of cleanup before we can merge this into our zip codes gdf.

As I am primarily concerned about living near other people in their twenties, I will join these fields to get a category that represents total percent of every zip code population in their twenties. The below data table contains "twenties" as an indicator summarized by ZCTA.

Visualizing and sorting out Boston's "twenties"

Below we will reveal where people tend to live in Boston if they're in their twenties.

Wow- this map and legend make it look like there could be neighborhoods in Boston where 60% of people are in their twenties, along with a significant amount that look to be housing a near-zero percentage of twentysomethings. Let's look at a table of values sorted by percent of total zip population aged 20-29 to learn more.

From the lowest values at the bottom of this table, we've also learned that there are multiple zip codes in Boston where 0% of people are in their twenties! Some quick research shows that this includes the zip code "02047", a small coastal zip code with a total 2019 population of 97 people, as well as the 02203 zip code that covers just a few blocks near downtown's Government Center T stop. It seems that these results are plausible and we can proceed with our analysis.

Scaling to create our fourth suitability index value

Now we have created our third index, which is scaled from 0 to 1 where 1 means most suitable and highest percent of residents in their twenties.

Index 5: Boston Rent Medians

Now we'll use Census Bureau Population Estimates data on financial characteristics to see where in Boston montly housing costs are highest. We'll do this by extracting a variable that represents monthly median housing costs, titled "S2503_C01_024E."

Visualizing Boston's Rent Medians

We're normalizing this again but reversing the scale so that a low median rent is highly ranked!

Final Suitability Calculations

Now that we've created an index for each of our variables of interest, it's time to add up our scale. This will give us our suitability ranking of 5 which we can use to evaluate a neighborhood to settle in. Here's how we can calculate an unweighted index based on the indicators we've established thus far:

Time to visualize the final suitability table. We'll sort by indicator sum to see what zip codes have the highest suitability rating.

Weighted Suitability Calculator

The results are in: according to the unweighted index, I should move to 02139 (Cambridge)! I do already live in Cambridge, although not that exact zip code, so it seems I've done my work effectively here. However, I'd like to finish the analysis by weighting by some of what I see as my age group's personal preferences (maybe this time I'll get my own zip code in the top 5). As commute time by T is often more efficient than by bus, I'm going to weight T access more heavily. Rent will be the primary concern, and farmer's markets will be weighted least as they constitute more of a perk than a necessity.

Let's visualize.

In the table below, you can compare how zip codes scored under the two indicators and use the "weighting boost" column to compare how top-ranked neighborhoods fared when the weights shifted.

Ok, 01239 (Cambridge) is still on top, followed by 02446 and 02021 again. Now 01752 (another Cambridge zip code) has snuck in along with, somewhat shockingly, 01908 - the zipcode for Nahant. Despite all that metro Boston has to offer, maybe we should all just be moving to the beach instead.